Map Help for Final Project

In this project you may be tempted to create a map for your analysis. Well, things are not always straight forward so we are creating this additional ressource to help you if you are being ambitious.

In [1]:
# Importing in your required libraries
import pandas as pd
import altair as alt
alt.data_transformers.enable('default', max_rows=1000000)
import json

Remember that when you export your notebook to an html file, comment out the line alt.data_transformers.enable('data_server') in order for the visualizations to output.

Let's bring in the data. For reference, this data is a subset of the original data available on The Vancouver Data Portal. The data that we have given you was adapted from the json file and wrangled slightly so that it's easy to use for geographical visualizations.

In [2]:
df = pd.read_csv('vancouver_trees.csv')
In [3]:
df.head()
Out[3]:
std_street on_street species_name neighbourhood_name date_planted diameter street_side_name genus_name assigned civic_number plant_area curb tree_id common_name height_range_id on_street_block cultivar_name root_barrier latitude longitude
0 W 13TH AV MAPLE ST PSEUDOPLATANUS Kitsilano NaN 9.00 EVEN ACER N 1996 10 Y 13310 SYCAMORE MAPLE 4 2900 NaN N 49.259856 -123.150586
1 WALES ST WALES ST PLATANOIDES Renfrew-Collingwood 2018-11-28 3.00 ODD ACER N 5291 7 Y 259084 PRINCETON GOLD MAPLE 1 5200 PRINCETON GOLD N 49.236650 -123.051831
2 W BROADWAY W BROADWAY RUBRUM Kitsilano 1996-04-19 14.00 EVEN ACER N 3618 C Y 167986 KARPICK RED MAPLE 3 3600 KARPICK N 49.264250 -123.184020
3 PENTICTON ST PENTICTON ST CALLERYANA Renfrew-Collingwood 2006-03-06 3.75 EVEN PYRUS N 2502 5 Y 213386 CHANTICLEER PEAR 1 2500 CHANTICLEER Y 49.261036 -123.052921
4 RHODES ST RHODES ST GLYPTOSTROBOIDES Renfrew-Collingwood 2001-11-01 3.00 ODD METASEQUOIA N 5639 N Y 189223 DAWN REDWOOD 2 5600 NaN N 49.233354 -123.050249

Now we can use this data and make many different visualization (with additional wrangling) but this resource is here to help explain how we will make maps using this data.

Since Altair does not make Vancouver easy to locate on the global map and there is no projection for Canada like there is for the United states, we've made the geojson for Vancouver and it's neighbourhoods available through a url. This was obtain from the Vancouver Data Portal once again.

To make a base map of Vancouver we use the geojson url saved in url_geojson.

In [4]:
url_geojson = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'

Next, we must format it in a Topo json format which we convert using alt.Data().

In [5]:
data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))

data_geojson_remote
Out[5]:
Data({
  format: DataFormat({
    property: 'features',
    type: 'json'
  }),
  url: 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
})

We can then make our base Vancouver Altair map using the data_geojson_remote object as we've made maps in the past except this time we need to use an identity type and we need to reflectY=True. Without this second argument our map of Vancouver is upside down.

In [6]:
vancouver_map = alt.Chart(data_geojson_remote).mark_geoshape(
    color = 'gray', opacity= 0.5, stroke='white').encode(
).project(type='identity', reflectY=True)

vancouver_map
Out[6]:

Nice, we have a base map of Vancouver! 🎉

Now all we have to do is combine this with some of our tree data.

Let's plot the median diameter of the tree trunks for each neighbourhood.

I'm going to rename neighbourhood_name to name since that's what's it's called in the geojson url and we need to connect the two dataframes using the function transform_lookup().

I'm also going to select the median latitude and longitude columns for each neighbourhood as I'm going to make a point map using these coordinates after.

In [7]:
median_df = df.groupby('neighbourhood_name'
                      ).median().reset_index(
).rename(columns={'neighbourhood_name':'name'})[['name',
                                                 'diameter', 
                                                 'latitude', 
                                                 'longitude']]
median_df
Out[7]:
name diameter latitude longitude
0 Arbutus-Ridge 10.000 49.248710 -123.162175
1 Downtown 7.000 49.279975 -123.119297
2 Dunbar-Southlands 12.000 49.245220 -123.184440
3 Fairview 10.500 49.263544 -123.130863
4 Grandview-Woodland 10.000 49.272469 -123.064278
5 Hastings-Sunrise 10.000 49.274245 -123.043578
6 Kensington-Cedar Cottage 10.000 49.246389 -123.074296
7 Kerrisdale 11.000 49.227908 -123.153492
8 Killarney 8.750 49.222751 -123.037832
9 Kitsilano 13.000 49.264121 -123.162942
10 Marpole 9.000 49.212933 -123.131384
11 Mount Pleasant 11.625 49.261609 -123.098054
12 Oakridge 9.000 49.227719 -123.126659
13 Renfrew-Collingwood 7.750 49.246218 -123.040136
14 Riley Park 10.000 49.247340 -123.102256
15 Shaughnessy 12.500 49.246229 -123.141200
16 South Cambie 9.650 49.249015 -123.120555
17 Strathcona 9.000 49.278289 -123.090066
18 Sunset 9.500 49.221793 -123.092664
19 Victoria-Fraserview 9.000 49.221164 -123.063417
20 West End 11.000 49.286253 -123.134223
21 West Point Grey 12.000 49.263790 -123.205870

This now gives us the median lat and long coordinates as well as tree trunk diameter per Vancouver neighbourhood.

Now we link the shape file with the median tree dataframe using lookups like we learned in Module 6.

The neighbourhood is stored in the properties field, which we can access using properties.name.

We then grab the diameter and name column from median_df using LookupData().

We color the neighbourhoods based on trunk diameter size.

In [8]:
alt.Chart(data_geojson_remote).mark_geoshape().transform_lookup(
    lookup='properties.name',
    from_=alt.LookupData(median_df, 'name', ['diameter', 'name'])).encode(
    color='diameter:Q',
    tooltip='name:N').project(type='identity', reflectY=True)
Out[8]:

Look at that! We have a chloropleth map!

We learned that these can sometimes be a bit deceiving, so we can instead use point size to show diameter size instead.

In [9]:
points = alt.Chart(median_df).mark_circle().encode(
    longitude='longitude',
    latitude='latitude',
    size='diameter:Q',
    color = 'diameter:Q',
    tooltip='name').project(type= 'identity', reflectY=True)

points
Out[9]:

And overlay it on our base Vancouver map.

In [10]:
(vancouver_map + points).configure_view(stroke=None)
Out[10]:

In addition we could plot all the trees in the dataset using the latitude and longitude of each row/tree in the full dataframe.

In [11]:
points = alt.Chart(df).mark_circle(size=1, color='green').encode(
    longitude='longitude',
    latitude='latitude').project(type= 'identity', reflectY=True)

(vancouver_map + points).configure_view(stroke=None)
Out[11]:

This will give you the ability to plot an individual neighbourhood if you wish.

In [12]:
url_geojson_killarney = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/vancouver_neighbourhoods/killarney.geojson'
In [13]:
data_geojson_remote_kil = alt.Data(url=url_geojson_killarney, format=alt.DataFormat(property='features',type='json'))
In [14]:
killarney_map = alt.Chart(data_geojson_remote_kil).mark_geoshape(
    color = 'gray', opacity= 0.5, stroke='white').encode(
).project(type='identity', reflectY=True)

killarney_map
Out[14]:
In [15]:
df_kil = df[df['neighbourhood_name'] == 'Killarney']
df_kil.head()
Out[15]:
std_street on_street species_name neighbourhood_name date_planted diameter street_side_name genus_name assigned civic_number plant_area curb tree_id common_name height_range_id on_street_block cultivar_name root_barrier latitude longitude
34 ROSEMONT DRIVE ROSEMONT DRIVE BETULUS Killarney 2009-11-13 3.5 ODD CARPINUS N 2855 4 Y 227041 PYRAMIDAL EUROPEAN HORNBEAM 2 2800 FASTIGIATA Y 49.217863 -123.048077
43 ROSEMONT DRIVE ROSEMONT DRIVE TRUNCATUM Killarney 2009-11-17 4.0 ODD ACER N 3221 NaN Y 225492 PACIFIC SUNSET MAPLE 1 3200 PACIFIC SUNSET Y 49.215578 -123.038159
44 ROSEMONT DRIVE ROSEMONT DRIVE TRUNCATUM Killarney 2009-11-19 4.0 ODD ACER Y 3155 4 Y 225481 PACIFIC SUNSET MAPLE 1 3100 PACIFIC SUNSET Y 49.215569 -123.038612
65 KERR ST KERR ST EUCHLORA X Killarney NaN 13.0 EVEN TILIA N 7906 7 Y 87209 CRIMEAN LINDEN 4 7900 NaN N 49.212686 -123.041534
114 MARMION AV HAROLD ST CERASIFERA Killarney 1999-11-04 4.0 EVEN PRUNUS N 3255 N Y 180669 NIGHT PURPLE LEAF PLUM 2 5700 NIGRA N 49.231903 -123.035914
In [16]:
points_kil = alt.Chart(df_kil).mark_circle(size=5, color='green').encode(
    longitude='longitude',
    latitude='latitude').project(type= 'identity', reflectY=True)

points_kil
Out[16]:
In [17]:
(killarney_map + points_kil).configure_view(stroke=None)
Out[17]:
In [ ]:
 
In [ ]: